Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 114
Filtrar
1.
BMC Bioinformatics ; 25(1): 94, 2024 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-38438850

RESUMEN

BACKGROUND: Analysis of time-resolved postprandial metabolomics data can improve the understanding of metabolic mechanisms, potentially revealing biomarkers for early diagnosis of metabolic diseases and advancing precision nutrition and medicine. Postprandial metabolomics measurements at several time points from multiple subjects can be arranged as a subjects by metabolites by time points array. Traditional analysis methods are limited in terms of revealing subject groups, related metabolites, and temporal patterns simultaneously from such three-way data. RESULTS: We introduce an unsupervised multiway analysis approach based on the CANDECOMP/PARAFAC (CP) model for improved analysis of postprandial metabolomics data guided by a simulation study. Because of the lack of ground truth in real data, we generate simulated data using a comprehensive human metabolic model. This allows us to assess the performance of CP models in terms of revealing subject groups and underlying metabolic processes. We study three analysis approaches: analysis of fasting-state data using principal component analysis, T0-corrected data (i.e., data corrected by subtracting fasting-state data) using a CP model and full-dynamic (i.e., full postprandial) data using CP. Through extensive simulations, we demonstrate that CP models capture meaningful and stable patterns from simulated meal challenge data, revealing underlying mechanisms and differences between diseased versus healthy groups. CONCLUSIONS: Our experiments show that it is crucial to analyze both fasting-state and T0-corrected data for understanding metabolic differences among subject groups. Depending on the nature of the subject group structure, the best group separation may be achieved by CP models of T0-corrected or full-dynamic data. This study introduces an improved analysis approach for postprandial metabolomics data while also shedding light on the debate about correcting baseline values in longitudinal data analysis.


Asunto(s)
Medicina , Metabolómica , Humanos , Simulación por Computador , Análisis de Datos , Estado de Salud
3.
PLoS Comput Biol ; 19(6): e1011221, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37352364

RESUMEN

The intricate dependency structure of biological "omics" data, particularly those originating from longitudinal intervention studies with frequently sampled repeated measurements renders the analysis of such data challenging. The high-dimensionality, inter-relatedness of multiple outcomes, and heterogeneity in the studied systems all add to the difficulty in deriving meaningful information. In addition, the subtle differences in dynamics often deemed meaningful in nutritional intervention studies can be particularly challenging to quantify. In this work we demonstrate the use of quantitative longitudinal models within the repeated-measures ANOVA simultaneous component analysis+ (RM-ASCA+) framework to capture the dynamics in frequently sampled longitudinal data with multivariate outcomes. We illustrate the use of linear mixed models with polynomial and spline basis expansion of the time variable within RM-ASCA+ in order to quantify non-linear dynamics in a simulation study as well as in a metabolomics data set. We show that the proposed approach presents a convenient and interpretable way to systematically quantify and summarize multivariate outcomes in longitudinal studies while accounting for proper within subject dependency structures.


Asunto(s)
Algoritmos , Metabolómica , Simulación por Computador , Modelos Lineales
4.
J Allergy Clin Immunol Pract ; 11(7): 2162-2171.e6, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37146879

RESUMEN

BACKGROUND: All children experience numerous episodes of illness during the first 3 years of life. Most episodes are mild and handled without medical attention but nevertheless burden the families and society. There is a large, and still unexplained, variation in the burden of illness between children. OBJECTIVE: To describe and provide a better understanding of the disease burden of common childhood diseases through a data-driven approach investigating the communalities between symptom patterns and predefined variables on predispositions, pregnancy, birth, environment, and child development. METHODS: The study is based on the prospectively followed clinical mother-child cohort COpenhagen Prospective Studies on Asthma in Childhood, which includes 700 children with daily symptom registration in the first 3 years of life, including symptoms of cough, breathlessness, wheeze, cold, pneumonia, sore throat, ear infections, gastrointestinal infections, fever, and eczema. First, we described the number of episodes of symptoms. Next, factor analysis models were used to describe the variation in symptom load in the second year of life (both based on n = 556, with >90% complete diary). Then we characterized patterns of similarity between symptoms using a graphical network model (based on n = 403, with a 3-year monthly compliance of >50%). Finally, predispositions and pregnancy, birth, environmental, and developmental factors were added to the network model. RESULTS: The children experienced a median of 17 (interquartile range, 12-23) episodes of symptoms during the first 3 years of life, of which most were respiratory tract infections (median, 13; interquartile range, 9-18). The frequency of symptoms was the highest during the second year of life. Eczema symptoms were unrelated to the other symptoms. The strongest association to respiratory symptoms was found for maternal asthma, maternal smoking during the third trimester, prematurity, and CDHR3 genotype. This was in contrast to the lack of associations for the well-established asthma locus at 17q21. CONCLUSIONS: Healthy young children are burdened by multiple episodes of symptoms during the first 3 years of life. Prematurity, maternal asthma, and CDHR3 genotype were among the strongest drivers of symptom burden.


Asunto(s)
Asma , Eccema , Embarazo , Femenino , Humanos , Preescolar , Estudios Prospectivos , Asma/epidemiología , Asma/genética , Estudios de Cohortes , Disnea , Eccema/epidemiología , Ruidos Respiratorios , Proteínas Relacionadas con las Cadherinas , Proteínas de la Membrana
5.
Metabolites ; 12(12)2022 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-36557232

RESUMEN

Trained sensory panels are regularly used to rate food products but do not allow for data-driven approaches to steer food product development. This study evaluated the potential of a molecular-based strategy by analyzing 27 tomato soups that were enhanced with yeast-derived flavor products using a sensory panel as well as LC-MS and GC-MS profiling. These data sets were used to build prediction models for 26 different sensory attributes using partial least squares analysis. We found driving separation factors between the tomato soups and metabolites predicting different flavors. Many metabolites were putatively identified as dipeptides and sulfur-containing modified amino acids, which are scientifically described as related to umami or having "garlic-like" and "onion-like" attributes. Proposed identities of high-impact sensory markers (methionyl-proline and asparagine-leucine) were verified using MS/MS. The overall results highlighted the strength of combining sensory data and metabolomics platforms to find new information related to flavor perception in a complex food matrix.

6.
Microorganisms ; 10(11)2022 Oct 31.
Artículo en Inglés | MEDLINE | ID: mdl-36363749

RESUMEN

Increasing evidence indicates that the gut microbiome (GM) plays an important role in dyslipidemia. To date, however, no in-depth characterization of the associations between GM with lipoproteins distributions (LPD) among adult individuals with diverse BMI has been conducted. To determine such associations, we studied blood-plasma LPD, fecal short-chain fatty acids (SCFA) and GM of 262 Danes aged 19-89 years. Stratification of LPD segregated subjects into three clusters displaying recommended levels of lipoproteins and explained by age and body-mass-index. Higher levels of HDL2a and HDL2b were associated with a higher abundance of Ruminococcaceae and Christensenellaceae. Increasing levels of total cholesterol and LDL-1 and LDL-2 were positively associated with Lachnospiraceae and Coriobacteriaceae, and negatively with Bacteroidaceae and Bifidobacteriaceae. Metagenome-sequencing showed a higher abundance of biosynthesis of multiple B-vitamins and SCFA metabolism genes among healthier LPD profiles. Metagenomic-assembled genomes (MAGs) affiliated to Eggerthellaceae and Clostridiales were contributors of these genes and their relative abundance correlated positively with larger HDL subfractions. The study demonstrates that differences in composition and metabolic traits of the GM are associated with variations in LPD among the recruited subjects. These findings provide evidence for GM considerations in future research aiming to shed light on mechanisms of the GM-dyslipidemia axis.

7.
FEMS Microbiol Ecol ; 98(2)2022 03 08.
Artículo en Inglés | MEDLINE | ID: mdl-35137050

RESUMEN

Strigolactones are endogenous plant hormones regulating plant development and are exuded into the rhizosphere when plants experience nutrient deficiency. There, they promote the mutualistic association of plants with arbuscular mycorrhizal fungi that help the plant with the uptake of nutrients from the soil. This shows that plants actively establish-through the exudation of strigolactones-mutualistic interactions with microbes to overcome inadequate nutrition. The signaling function of strigolactones could possibly extend to other microbial partners, but the effect of strigolactones on the global root and rhizosphere microbiome remains poorly understood. Therefore, we analyzed the bacterial and fungal microbial communities of 16 rice genotypes differing in their root strigolactone exudation. Using multivariate analyses, distinctive differences in the microbiome composition were uncovered depending on strigolactone exudation. Moreover, the results of regression modeling showed that structural differences in the exuded strigolactones affected different sets of microbes. In particular, orobanchol was linked to the relative abundance of Burkholderia-Caballeronia-Paraburkholderia and Acidobacteria that potentially solubilize phosphate, while 4-deoxyorobanchol was associated with the genera Dyella and Umbelopsis. With this research, we provide new insight into the role of strigolactones in the interplay between plants and microbes in the rhizosphere.


Asunto(s)
Microbiota , Micorrizas , Oryza , Lactonas/análisis , Lactonas/química , Lactonas/farmacología , Raíces de Plantas/química , Rizosfera , Simbiosis
8.
BMC Bioinformatics ; 23(1): 31, 2022 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-35012453

RESUMEN

BACKGROUND: Analysis of dynamic metabolomics data holds the promise to improve our understanding of underlying mechanisms in metabolism. For example, it may detect changes in metabolism due to the onset of a disease. Dynamic or time-resolved metabolomics data can be arranged as a three-way array with entries organized according to a subjects mode, a metabolites mode and a time mode. While such time-evolving multiway data sets are increasingly collected, revealing the underlying mechanisms and their dynamics from such data remains challenging. For such data, one of the complexities is the presence of a superposition of several sources of variation: induced variation (due to experimental conditions or inborn errors), individual variation, and measurement error. Multiway data analysis (also known as tensor factorizations) has been successfully used in data mining to find the underlying patterns in multiway data. To explore the performance of multiway data analysis methods in terms of revealing the underlying mechanisms in dynamic metabolomics data, simulated data with known ground truth can be studied. RESULTS: We focus on simulated data arising from different dynamic models of increasing complexity, i.e., a simple linear system, a yeast glycolysis model, and a human cholesterol model. We generate data with induced variation as well as individual variation. Systematic experiments are performed to demonstrate the advantages and limitations of multiway data analysis in analyzing such dynamic metabolomics data and their capacity to disentangle the different sources of variations. We choose to use simulations since we want to understand the capability of multiway data analysis methods which is facilitated by knowing the ground truth. CONCLUSION: Our numerical experiments demonstrate that despite the increasing complexity of the studied dynamic metabolic models, tensor factorization methods CANDECOMP/PARAFAC(CP) and Parallel Profiles with Linear Dependences (Paralind) can disentangle the sources of variations and thereby reveal the underlying mechanisms and their dynamics.


Asunto(s)
Metabolómica , Simulación por Computador , Humanos
9.
Anal Chem ; 94(2): 628-636, 2022 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-34936323

RESUMEN

Lipoprotein subfractions are biomarkers for the early diagnosis of cardiovascular diseases. The reference method, ultracentrifugation, for measuring lipoproteins is time-consuming, and there is a need to develop a rapid method for cohort screenings. This study presents partial least-squares regression models developed using 1H nuclear magnetic resonance (NMR) spectra and concentrations of lipoproteins as measured by ultracentrifugation on 316 healthy Danes. This study explores, for the first time, different regions of the 1H NMR spectrum representing signals of molecules in lipoprotein particles and different lipid species to develop parsimonious, reliable, and optimal prediction models. A total of 65 lipoprotein main and subfractions were predictable with high accuracy, Q2 of >0.6, using an optimal spectral region (1.4-0.6 ppm) containing methylene and methyl signals from lipids. The models were subsequently tested on an independent cohort of 290 healthy Swedes with predicted and reference values matching by up to 85-95%. In addition, an open software tool was developed to predict lipoproteins concentrations in human blood from standardized 1H NMR spectral recordings.


Asunto(s)
Lipoproteínas LDL , Lipoproteínas , Humanos , Espectroscopía de Resonancia Magnética/métodos , Espectroscopía de Protones por Resonancia Magnética , Suecia
10.
PLoS Comput Biol ; 17(11): e1009585, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34752455

RESUMEN

Longitudinal intervention studies with repeated measurements over time are an important type of experimental design in biomedical research. Due to the advent of "omics"-sciences (genomics, transcriptomics, proteomics, metabolomics), longitudinal studies generate increasingly multivariate outcome data. Analysis of such data must take both the longitudinal intervention structure and multivariate nature of the data into account. The ASCA+-framework combines general linear models with principal component analysis and can be used to separate and visualize the multivariate effect of different experimental factors. However, this methodology has not yet been developed for the more complex designs often found in longitudinal intervention studies, which may be unbalanced, involve randomized interventions, and have substantial missing data. Here we describe a new methodology, repeated measures ASCA+ (RM-ASCA+), and show how it can be used to model metabolic changes over time, and compare metabolic changes between groups, in both randomized and non-randomized intervention studies. Tools for both visualization and model validation are discussed. This approach can facilitate easier interpretation of data from longitudinal clinical trials with multivariate outcomes.


Asunto(s)
Neoplasias de la Mama/tratamiento farmacológico , Antineoplásicos Inmunológicos/uso terapéutico , Cirugía Bariátrica , Bevacizumab/uso terapéutico , Interpretación Estadística de Datos , Femenino , Genómica , Humanos , Estudios Longitudinales , Metabolómica , Proteómica , Reproducibilidad de los Resultados
11.
Anal Chim Acta ; 1185: 339073, 2021 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-34711318

RESUMEN

In analytical chemistry spectroscopy is attractive for high-throughput quantification, which often relies on inverse regression, like partial least squares regression. Due to a multivariate nature of spectroscopic measurements an analyte can be quantified in presence of interferences. However, if the model is not fully selective against interferences, analyte predictions may be biased. The degree of model selectivity against an interferent is defined by the inner relation between the regression vector and the pure interfering signal. If the regression vector is orthogonal to the signal, this inner relation equals zero and the model is fully selective. The degree of model selectivity largely depends on calibration data quality. Strong correlations may deteriorate calibration data resulting in poorly selective models. We show this using a fructose-maltose model system. Furthermore, we modify the NIPALS algorithm to improve model selectivity when calibration data are deteriorated. This modification is done by incorporating a projection matrix into the algorithm, which constrains regression vector estimation to the null-space of known interfering signals. This way known interfering signals are handled, while unknown signals are accounted for by latent variables. We test the modified algorithm and compare it to the conventional NIPALS algorithm using both simulated and industrial process data. The industrial process data consist of mid-infrared measurements obtained on mixtures of beta-lactoglobulin (analyte of interest), and alpha-lactalbumin and caseinoglycomacropeptide (interfering species). The root mean squared error of beta-lactoglobulin (% w/w) predictions of a test set was 0.92 and 0.33 when applying the conventional and the modified NIPALS algorithm, respectively. Our modification of the algorithm returns simpler models with improved selectivity and analyte predictions. This paper shows how known interfering signals may be utilized in a direct fashion, while benefitting from a latent variable approach. The modified algorithm can be viewed as a fusion between ordinary least squares regression and partial least squares regression and may be very useful when knowledge of some (but not all) interfering species is available.


Asunto(s)
Algoritmos , Maltosa , Calibración , Análisis de los Mínimos Cuadrados , Análisis Espectral
12.
Metabolomics ; 17(9): 77, 2021 08 25.
Artículo en Inglés | MEDLINE | ID: mdl-34435244

RESUMEN

INTRODUCTION: The relationship between the chemical composition of food products and their sensory profile is a complex association confronting many challenges. However, new untargeted methodologies are helping correlate metabolites with sensory characteristics in a simpler manner. Nevertheless, in the pilot phase of a project, where only a small set of products are used to explore the relationships, choices have to be made about the most appropriate untargeted metabolomics methodology. OBJECTIVE: To provide a framework for selecting a metabolite-sensory methodology based on: the quality of measurements, the relevance of the detected metabolites in terms of distinguishing between products or in terms of whether they can be related to the sensory attributes of the products. METHODS: In this paper we introduce a systematic approach to explore all these different aspects driving the choice for the most appropriate metabolomics method. RESULTS: As an example we have used a tomato soup project where the choice between two sampling methods (SPME and SBSE) had to be made. The results are not always consistently pointing to the same method as being the best. SPME was able to detect metabolites with a better precision, SBSE seemed to be able to provide a better distinction between the soups. CONCLUSION: The three levels of comparison provide information on how the methods could perform in a follow up study and will help the researcher to make a final selection for the most appropriate method based on their strengths and weaknesses.


Asunto(s)
Metabolómica , Estudios de Seguimiento
13.
Curr Opin Biotechnol ; 70: 255-261, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34242993

RESUMEN

The plant microbiome plays an essential role in supporting plant growth and health, but plant molecular mechanisms underlying its recruitment are still unclear. Multi-omics data integration methods can be used to unravel new signalling relationships. Here, we review the effects of plant genetics and root exudates on root microbiome recruitment, and discuss methodological advances in data integration approaches that can help us to better understand and optimise the crop-microbiome interaction for a more sustainable agriculture.


Asunto(s)
Microbiota , Agricultura , Microbiota/genética , Desarrollo de la Planta , Raíces de Plantas/genética , Plantas
14.
Anal Chem ; 92(20): 13614-13621, 2020 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-32991165

RESUMEN

Metabolomics is becoming a mature part of analytical chemistry as evidenced by the growing number of publications and attendees of international conferences dedicated to this topic. Yet, a systematic treatment of the fundamental structure and properties of metabolomics data is lagging behind. We want to fill this gap by introducing two fundamental theories concerning metabolomics data: data theory and measurement theory. Our approach is to ask simple questions, the answers of which require applying these theories to metabolomics. We show that we can distinguish at least four different levels of metabolomics data with different properties and warn against confusing data with numbers. This treatment provides a theoretical underpinning for preprocessing and postprocessing methods in metabolomics and also argues for a proper match between type of metabolomics data and the biological question to be answered. The approach can be extended to other omics measurements such as proteomics and is thus of relevance for a large analytical chemistry community.


Asunto(s)
Metabolómica/métodos , Modelos Teóricos , Cromatografía de Gases , Cromatografía Liquida , Análisis Discriminante , Análisis de los Mínimos Cuadrados , Espectroscopía de Resonancia Magnética , Espectrometría de Masas , Análisis de Componente Principal
15.
PLoS Comput Biol ; 16(9): e1008295, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32997685

RESUMEN

The field of transcriptomics uses and measures mRNA as a proxy of gene expression. There are currently two major platforms in use for quantifying mRNA, microarray and RNA-Seq. Many comparative studies have shown that their results are not always consistent. In this study we aim to find a robust method to increase comparability of both platforms enabling data analysis of merged data from both platforms. We transformed high dimensional transcriptomics data from two different platforms into a lower dimensional, and biologically relevant dataset by calculating enrichment scores based on gene set collections for all samples. We compared the similarity between data from both platforms based on the raw data and on the enrichment scores. We show that the performed data transforms the data in a biologically relevant way and filters out noise which leads to increased platform concordance. We validate the procedure using predictive models built with microarray based enrichment scores to predict subtypes of breast cancer using enrichment scores based on sequenced data. Although microarray and RNA-Seq expression levels might appear different, transforming them into biologically relevant gene set enrichment scores significantly increases their correlation, which is a step forward in data integration of the two platforms. The gene set collections were shown to contain biologically relevant gene sets. More in-depth investigation on the effect of the composition, size, and number of gene sets that are used for the transformation is suggested for future research.


Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos , RNA-Seq , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Femenino , Humanos , Reproducibilidad de los Resultados , Transcriptoma/genética
16.
BMC Med Res Methodol ; 20(1): 191, 2020 07 16.
Artículo en Inglés | MEDLINE | ID: mdl-32677968

RESUMEN

BACKGROUND: Vaccine clinical studies typically provide time-resolved data on adaptive response read-outs in response to the administration of that particular vaccine to a cohort of individuals. However, modeling such data is challenged by the properties of these time-resolved profiles such as non-linearity, scarcity of measurement points, scheduling of the vaccine at multiple time points. Linear Mixed Models (LMM) are often used for the analysis of longitudinal data but their use in these time-resolved immunological data is not common yet. Apart from the modeling challenges mentioned earlier, selection of the optimal model by using information-criterion-based measures is far from being straight-forward. The aim of this study is to provide guidelines for the application and selection of LMMs that deal with the challenging characteristics of the typical data sets in the field of vaccine clinical studies. METHODS: We used antibody measurements in response to Hepatitis-B vaccine with five different adjuvant formulations for demonstration purposes. We built piecewise-linear, piecewise-quadratic and cubic models with transformations of the axes with pre-selected or optimized knot locations where time is a numerical variable. We also investigated models where time is categorical and random effects are shared intercepts between different measurement points. We compared all models by using Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Deviance Information Criterion (DIC), variations of conditional AIC and by visual inspection of the model fit in the light of prior biological information. RESULTS: There are various ways of dealing with the challenges of the data which have their own advantages and disadvantages. We explain these in detail here. Traditional information-criteria-based measures work well for the coarse selection of the model structure and complexity, however are not efficient at fine tuning of the complexity level of the random effects. CONCLUSIONS: We show that common statistical measures for optimal model complexity are not sufficient. Rather, explicitly accounting for model purpose and biological interpretation is needed to arrive at relevant models. TRIAL REGISTRATION: Clinical trial registration number for this study: NCT00805389, date of registration: December 9, 2008 (pro-active registration).


Asunto(s)
Teorema de Bayes , Humanos
17.
Biophys J ; 119(1): 87-98, 2020 07 07.
Artículo en Inglés | MEDLINE | ID: mdl-32562617

RESUMEN

Intermediate species are hypothesized to play an important role in the toxicity of amyloid formation, a process associated with many diseases. This process can be monitored with conventional and two-dimensional infrared spectroscopy, vibrational circular dichroism, and optical and electron microscopy. Here, we present how combining these techniques provides insight into the aggregation of the hexapeptide VEALYL (Val-Glu-Ala-Leu-Tyr-Leu), the B-chain residue 12-17 segment of insulin that forms amyloid fibrils (intermolecularly hydrogen-bonded ß-sheets) when the pH is lowered below 4. Under such circumstances, the aggregation commences after approximately an hour and continues to develop over a period of weeks. Singular value decompositions of one-dimensional and two-dimensional infrared spectroscopy spectra indicate that intermediate species are formed during the aggregation process. Multivariate curve resolution analyses of the one and two-dimensional infrared spectroscopy data show that the intermediates are more fibrillar and deprotonated than the monomers, whereas they are less ordered than the final fibrillar structure that is slowly formed from the intermediates. A comparison between the vibrational circular dichroism spectra and the scanning transmission electron microscopy and optical microscope images shows that the formation of mature fibrils of VEALYL correlates with the appearance of spherulites that are on the order of several micrometers, which give rise to a "giant" vibrational circular dichroism effect.


Asunto(s)
Amiloide , Microscopía , Dicroismo Circular , Conformación Proteica en Lámina beta , Espectroscopía Infrarroja por Transformada de Fourier , Vibración
18.
Mol Omics ; 16(3): 231-242, 2020 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-32211690

RESUMEN

Rapid progress in high-throughput glycomics analysis enables the researchers to conduct large sample studies. Typically, the between-subject differences in total abundance of raw glycomics data are very large, and it is necessary to reduce the differences, making measurements comparable across samples. Essentially there are two ways to approach this issue: row-wise and column-wise normalization. In glycomics, the differences per subject are usually forced to be exactly zero, by scaling each sample having the sum of all glycan intensities equal to 100%. This total area (row-wise) normalization (TA) results in so-called compositional data, rendering many standard multivariate statistical methods inappropriate or inapplicable. Ignoring the compositional nature of the data, moreover, may lead to spurious results. Alternatively, a log-transformation to the raw data can be performed prior to column-wise normalization and implementing standard statistical tools. Until now, there is no clear consensus on the appropriate normalization method applied to glycomics data. Nor is systematic investigation of impact of TA on downstream analysis available to justify the choice of TA. Our motivation lies in efficient variable selection to identify glycan biomarkers with regard to accurate prediction as well as interpretability of the model chosen. Via extensive simulations we investigate how different normalization methods affect the performance of variable selection, and compare their performance. We also address the effect of various types of measurement error in glycans: additive, multiplicative and two-component error. We show that when sample-wise differences are not large row-wise normalization (like TA) can have deleterious effects on variable selection and prediction.


Asunto(s)
Biomarcadores/análisis , Glicómica/métodos , Algoritmos , Calibración , Espectrometría de Masas
19.
Sci Rep ; 10(1): 438, 2020 01 16.
Artículo en Inglés | MEDLINE | ID: mdl-31949233

RESUMEN

Correlation coefficients are abundantly used in the life sciences. Their use can be limited to simple exploratory analysis or to construct association networks for visualization but they are also basic ingredients for sophisticated multivariate data analysis methods. It is therefore important to have reliable estimates for correlation coefficients. In modern life sciences, comprehensive measurement techniques are used to measure metabolites, proteins, gene-expressions and other types of data. All these measurement techniques have errors. Whereas in the old days, with simple measurements, the errors were also simple, that is not the case anymore. Errors are heterogeneous, non-constant and not independent. This hampers the quality of the estimated correlation coefficients seriously. We will discuss the different types of errors as present in modern comprehensive life science data and show with theory, simulations and real-life data how these affect the correlation coefficients. We will briefly discuss ways to improve the estimation of such coefficients.


Asunto(s)
Modelos Estadísticos , Proyectos de Investigación , Biología Computacional
20.
Metabolomics ; 16(1): 2, 2019 12 03.
Artículo en Inglés | MEDLINE | ID: mdl-31797165

RESUMEN

INTRODUCTION: Integrative analysis of multiple data sets can provide complementary information about the studied biological system. However, data fusion of multiple biological data sets can be complicated as data sets might contain different sources of variation due to underlying experimental factors. Therefore, taking the experimental design of data sets into account could be of importance in data fusion concept. OBJECTIVES: In the present work, we aim to incorporate the experimental design information in the integrative analysis of multiple designed data sets. METHODS: Here we describe penalized exponential ANOVA simultaneous component analysis (PE-ASCA), a new method for integrative analysis of data sets from multiple compartments or analytical platforms with the same underlying experimental design. RESULTS: Using two simulated cases, the result of simultaneous component analysis (SCA), penalized exponential simultaneous component analysis (P-ESCA) and ANOVA-simultaneous component analysis (ASCA) are compared with the proposed method. Furthermore, real metabolomics data obtained from NMR analysis of two different brains tissues (hypothalamus and midbrain) from the same piglets with an underlying experimental design is investigated by PE-ASCA. CONCLUSIONS: This method provides an improved understanding of the common and distinct variation in response to different experimental factors.


Asunto(s)
Metabolómica , Proyectos de Investigación , Algoritmos , Animales , Hipotálamo/metabolismo , Mesencéfalo/metabolismo , Resonancia Magnética Nuclear Biomolecular , Análisis de Componente Principal , Porcinos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...